其他
性能优化技巧:有序归并
1、小数据全内存测试
1. Oracle测试
select
l_year,
sum(volume) as revenu,
sum(l_quantity) as quantity
from
(
select
extract(year from l_shipdate) as l_year,
(l_extendedprice * (1 - l_discount) ) as volume,
l_quantity
from
orderdetail
)
group by
l_year
union all
select
2019 as l_year,
count(o_orderkey) as revenu,
count(o_totalprice) as quantity
from
orders;
(2)有关联测试
select
l_year,
sum(volume) as revenu,
sum(l_quantity) as quantity
from
(
select
extract(year from l_shipdate) as l_year,
(l_extendedprice * (1 - l_discount) ) as volume,
l_quantity
from
orders,
orderdetail
where
o_orderkey = l_orderkey
and l_quantity>0
)
group by
l_year;
查询条件 l_quantity>0 始终为真,没有过滤数据,是为了确保会读取这一列数据。
2. SPL测试
A | |
1 | >orders=file("/home/ctx/orders.ctx").open().memory() |
2 | >orderdetail=file("/home/ctx/orderdetail.ctx").open().memory() |
3 | =now() |
4 | =orderdetail.cursor(L_ORDERKEY,L_EXTENDEDPRICE,L_DISCOUNT,L_SHIPDATE,L_QUANTITY).groups(year(L_SHIPDATE):l_year; sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):revenue,sum(L_QUANTITY):quantity) |
5 | =orders.groups(;count(O_ORDERKEY),count(O_TOTALPRICE)) |
6 | =interval@s(A3,now()) |
(2)有关联测试
编写SPL脚本如下:
A | |
1 | >orders=file("/home/ctx/orders.ctx").open().memory() |
2 | >orderdetail=file("/home/ctx/orderdetail.ctx").open().memory() |
3 | =now() |
4 | =orders.cursor(O_ORDERKEY,O_TOTALPRICE;O_TOTALPRICE>0) |
5 | =orderdetail.cursor(L_ORDERKEY,L_EXTENDEDPRICE,L_DISCOUNT,L_SHIPDATE,L_QUANTITY) |
6 | =joinx(A5:detail,L_ORDERKEY;A4:orders,O_ORDERKEY) |
7 | =A6.groups(year(detail.L_SHIPDATE):l_year;sum(detail.L_EXTENDEDPRICE*(1-detail.L_DISCOUNT)):revenue, sum(detail.L_QUANTITY):quantity) |
8 | =interval@s(A3,now()) |
A6中的joinx就是有序归并关联函数,要求关联字段都按升序排列。
3. 测试结果及分析
类别 | 无关联 | 有关联 | 变慢倍数 | 关联用时 |
Oracle | 16 | 67 | 4.2 | 51 |
SPL | 14 | 32 | 2.3 | 18 |
每种测试结果都是多次运行、数据充分缓存以后,取最快的一次。
1. Oracle测试
2. SPL测试
A | |
1 | =now() |
2 | =file("/home/ctx/lineitem.ctx").open().cursor@m(L_ORDERKEY,L_EXTENDEDPRICE,L_DISCOUNT,L_SHIPDATE,L_QUANTITY;;8) |
3 | =A2.groups(year(L_SHIPDATE):l_year; sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):revenue,sum(L_QUANTITY):quantity) |
4 | =file("/home/ctx/orders.ctx").open().cursor@m(O_ORDERKEY,O_TOTALPRICE;;8) |
5 | =A4.total(count(O_ORDERKEY),count(O_TOTALPRICE)) |
6 | =interval@s(A1,now()) |
数据量大,A2和A4都使用8路并行游标读数。
A | |
1 | =now() |
2 | =file("/home/ctx/orders.ctx").open().cursor@m(O_ORDERKEY,O_TOTALPRICE;O_TOTALPRICE>0;8) |
3 | =file("/home/ctx/lineitem.ctx").open().cursor(L_ORDERKEY,L_EXTENDEDPRICE,L_DISCOUNT,L_SHIPDATE,L_QUANTITY;;A2) |
4 | =joinx(A3:detail,L_ORDERKEY;A2:orders,O_ORDERKEY) |
5 | =A4.groups(year(detail.L_SHIPDATE):l_year; sum(detail.L_EXTENDEDPRICE*(1-detail.L_DISCOUNT)):revenue,sum(detail.L_QUANTITY):quantity) |
6 | =interval@s(A1,now()) |
A3中生成游标时用了A2作为参数,表示与主表orders的主键同步分段读取。
3. 测试结果及分析
类别 | 无关联 | 有关联 | 变慢倍数 | 关联用时 |
Oracle | 265 | 863 | 3.3 | 598 |
SPL | 70 | 101 | 1.4 | 31 |
计算量分析及关联用时计算与小数据测试同理。
更多性能优化技巧,可在底部“阅读原文”中查看
重磅!开源SPL交流群成立了
简单好用的SPL开源啦!
为了给感兴趣的小伙伴们提供一个相互交流的平台,
特地开通了交流群(群完全免费,不广告不卖课)
需要进群的朋友,可长按扫描下方二维码
本文感兴趣的朋友,请转到阅读原文去收藏 ^_^